perf: Remove allocation overhead when retrieving labels for Waypoints() and PolygonTriggers()#2761
Conversation
…() and PolygonTriggers()
|
| Filename | Overview |
|---|---|
| Generals/Code/GameEngine/Include/GameLogic/TerrainLogic.h | Changes getPathLabel1/2/3 return type from AsciiString (by value) to const AsciiString& to eliminate per-call heap allocations; all call sites safely copy or compare the result without storing the reference. |
| Generals/Code/GameEngine/Include/GameLogic/PolygonTrigger.h | Changes getTriggerName return type from AsciiString (by value) to const AsciiString& to eliminate per-call heap allocation; existing callers all assign to AsciiString by value or consume the result inline. |
| GeneralsMD/Code/GameEngine/Include/GameLogic/TerrainLogic.h | Zero Hour mirror of the Generals TerrainLogic.h change; identical update to getPathLabel1/2/3 return types. Same safety assessment applies. |
| GeneralsMD/Code/GameEngine/Include/GameLogic/PolygonTrigger.h | Zero Hour mirror of the Generals PolygonTrigger.h change; identical update to getTriggerName return type. Same safety assessment applies. |
Sequence Diagram
sequenceDiagram
participant AI as AI Unit (per frame)
participant SC as ScriptConditions / isPurposeOfPath
participant WP as Waypoint
participant PT as PolygonTrigger
AI->>SC: check waypoint path label
SC->>WP: getPathLabel1() [before: AsciiString copy]
WP-->>SC: "const AsciiString& (no alloc)"
SC->>WP: getPathLabel2()
WP-->>SC: "const AsciiString& (no alloc)"
SC->>WP: getPathLabel3()
WP-->>SC: "const AsciiString& (no alloc)"
AI->>SC: check polygon trigger area
SC->>PT: getTriggerName() [before: AsciiString copy]
PT-->>SC: "const AsciiString& (no alloc)"
Reviews (1): Last reviewed commit: "perf: Remove allocation overhead when re..." | Re-trigger Greptile
xezon
left a comment
There was a problem hiding this comment.
Why does this improve performance? Because of the global ref counter lock?
| PolygonTrigger *getNext() {return m_nextPolygonTrigger;} | ||
| const PolygonTrigger *getNext() const {return m_nextPolygonTrigger;} | ||
| AsciiString getTriggerName() const {return m_triggerName;} ///< Gets the trigger name. | ||
| const AsciiString& getTriggerName() const {return m_triggerName;} ///< Gets the trigger name. |
There was a problem hiding this comment.
This does not allocate. But it does touch the string ref counter.
Callers should also use references then, for example:
PolygonTrigger *TerrainLogic::getTriggerAreaByName( AsciiString name )
{
for (PolygonTrigger* pTrig = PolygonTrigger::getFirstPolygonTrigger(); pTrig; pTrig = pTrig->getNext()) {
AsciiString trigName = pTrig->getTriggerName(); // <---- can take const ref then
if (name == trigName)
return pTrig;
}
return nullptr;
}
This is where the problem lies, a copy is passed into And this is done hundreds to thousands of times per frame if there are a lot of path scripted units in a scene. It is a night and day difference in wave 1 of cobalt rush when a reference is passed from the function instead. CompareNoCase already takes a reference as well. |
|
I still don't see it. Can you explain how it allocates? if (label.compareNoCase(way->getPathLabel1())==0) match = true; |
|
FWIW make sure to measure / benchmark with release builds. |
i know, this does affect release too. |
It is not allocating/deallocating. The cost comes from the critical section, which was added in #799 last year. Better replace the critical section with atomic counting.
|
There was always a critical section, even in the original code, they just called it a "fast critical section". There still needs to be a lock as it is protecting the allocated data and not just the reference counter. |
This is not quite right. EA swapped the Critical Section in AsciiString for Atomic Counting. I expect this was deliberate, to tackle the same kind of performance culprits that you are observing now. inline AsciiString::AsciiString(const AsciiString& stringSrc) : m_data(stringSrc.m_data)
{
// don't need this if we're using InterlockedIncrement
// FastCriticalSectionClass::LockClass lock(TheAsciiStringCriticalSection);
if (m_data)
// ++m_data->m_refCount;
// yes, I know it's not a DWord but we're incrementing so we're safe
InterlockedIncrement((long *)&m_data->m_refCount); // <----- Atomic counter
validate();
}
inline void AsciiString::releaseBuffer()
{
// FastCriticalSectionClass::LockClass lock(TheAsciiStringCriticalSection);
validate();
if (m_data)
{
InterlockedDecrement((long *)&m_data->m_refCount); // <----- Atomic counter
if (!m_data->m_refCount)
freeBytes();
m_data = 0;
}
validate();
}Using atomic counter is the right thing to do. |
I was probably remembering the fast critical section bit that was not used, the thread safety is still a problem though which the atomic counters do not provide. As far as i am aware ascii strings are used between threads in the audio system. Might be wrong though, need to check. The ref counter could always be placed within the Ascii/Unicode String instead of within the data that gets allocated to it. Edit - actually the ref conter cannot be on the ascii/Unicode string object otherwise there could be out of sync copies of it. |
|
What is the concern regarding thread safety? The only write shared access point should be the counter, not the string data. The Atomic Counter is totally sufficient. The only problem currently is that VC6 only has 4 byte atomic counter, but we have a 2 byte counter here. I think 2 byte counter is available in assembler though. |
The reference counter is part of the allocated data object for the string, so if one thread releases the data as another one accesses it, then you have a race condition. |
This is impossible. Ref counter will not reach 0 when 2 threads hold a reference. |
One thread might be trying to gain a reference to the object just as another is releasing it, that's the situation i am considering. |
This is also impossible. If the unique owner is releasing it, there is will be no one else having any opportunity to take ownership. The Atomic counter approach is rock solid - do not worry. |
How is it impossible? if the original thread that owned the string deletes it just at the point that another thread goes to take a reference to it, you can have a race condition on checking if m_data exists. The original owning thread could clear the data and the new thread could just be behind it, pass the m_data check then fault on incrementing the counter as m_data is no longer valid. Unless i am missing something, since the counter is part of the allocated data, i don't see how you cannot end up with a race condition in some circumstances. |
|
You are right that is a data race. The same principles as for |



This PR reduces the allocation overhead seen when waypoint path labels and polygontrigger names are retrieved.
In mod maps with a lot of path scripted ai, such as Cobal Rush, waypoint labels are retrieved and compared every frame.
This will occur for every AI unit that is pathing along the waypoint. Polygon triggers also have a similar issue where they can be repeatedly compared every frame per scripted unit.
The following shows the side effects of this change for waypoint path labels.
Before:

After:

In game the most significant observation was that the first wave in Cobalt Rush went from stuttering to smooth.
The average FPS also significantly increased when observing the first wave and the camera also stopped stuttering.
The above flame graphs were captured in a debug build, but the stuttering still occurs in a release build and is alleviated with the same fix.